Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 19302 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.9 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 11 |
amt is highly correlated with is_fraud | High correlation |
zip is highly correlated with long and 1 other fields | High correlation |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with zip and 1 other fields | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with zip and 1 other fields | High correlation |
is_fraud is highly correlated with amt | High correlation |
amt is highly correlated with is_fraud | High correlation |
zip is highly correlated with long and 1 other fields | High correlation |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with zip and 1 other fields | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with zip and 1 other fields | High correlation |
is_fraud is highly correlated with amt | High correlation |
zip is highly correlated with long and 1 other fields | High correlation |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with zip and 1 other fields | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with zip and 1 other fields | High correlation |
category is highly correlated with amt and 2 other fields | High correlation |
amt is highly correlated with category | High correlation |
zip is highly correlated with lat and 3 other fields | High correlation |
lat is highly correlated with zip and 3 other fields | High correlation |
long is highly correlated with zip and 3 other fields | High correlation |
merch_lat is highly correlated with zip and 3 other fields | High correlation |
merch_long is highly correlated with zip and 3 other fields | High correlation |
hour is highly correlated with category and 1 other fields | High correlation |
is_fraud is highly correlated with category and 1 other fields | High correlation |
is_fraud is uniformly distributed | Uniform |
hour has 1143 (5.9%) zeros | Zeros |
day has 3446 (17.9%) zeros | Zeros |
Reproduction
| Analysis started | 2023-02-02 02:50:00.141681 |
|---|---|
| Analysis finished | 2023-02-02 02:50:13.862423 |
| Duration | 13.72 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 150.9 KiB |
| grocery_pos | |
|---|---|
| shopping_net | |
| shopping_pos | |
| gas_transport | |
| misc_net | |
| Other values (9) |
Length
| Max length | 14 |
|---|---|
| Median length | 11 |
| Mean length | 10.68687183 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | grocery_pos |
|---|---|
| 2nd row | gas_transport |
| 3rd row | grocery_pos |
| 4th row | gas_transport |
| 5th row | grocery_pos |
Common Values
| Value | Count | Frequency (%) |
| grocery_pos | 3180 | |
| shopping_net | 2922 | |
| shopping_pos | 1940 | |
| gas_transport | 1752 | |
| misc_net | 1605 | |
| home | 1185 | 6.1% |
| kids_pets | 1159 | 6.0% |
| entertainment | 967 | 5.0% |
| misc_pos | 955 | 4.9% |
| personal_care | 937 | 4.9% |
| Other values (4) | 2700 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| grocery_pos | 3180 | |
| shopping_net | 2922 | |
| shopping_pos | 1940 | |
| gas_transport | 1752 | |
| misc_net | 1605 | |
| home | 1185 | 6.1% |
| kids_pets | 1159 | 6.0% |
| entertainment | 967 | 5.0% |
| misc_pos | 955 | 4.9% |
| personal_care | 937 | 4.9% |
| Other values (4) | 2700 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 13938 |
|---|---|
| Distinct (%) | 72.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 299.4012859 |
| Minimum | 1 |
|---|---|
| Maximum | 7508.46 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4.09 |
| Q1 | 20.35 |
| median | 88.015 |
| Q3 | 479.595 |
| 95-th percentile | 1027.338 |
| Maximum | 7508.46 |
| Range | 7507.46 |
| Interquartile range (IQR) | 459.245 |
Descriptive statistics
| Standard deviation | 375.6726194 |
|---|---|
| Coefficient of variation (CV) | 1.254746179 |
| Kurtosis | 8.056669195 |
| Mean | 299.4012859 |
| Median Absolute Deviation (MAD) | 81.165 |
| Skewness | 1.554209483 |
| Sum | 5779043.62 |
| Variance | 141129.917 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9.94 | 11 | 0.1% |
| 8.29 | 11 | 0.1% |
| 9 | 11 | 0.1% |
| 9.02 | 11 | 0.1% |
| 9.95 | 10 | 0.1% |
| 8.73 | 10 | 0.1% |
| 1.12 | 10 | 0.1% |
| 4.92 | 9 | < 0.1% |
| 7.23 | 9 | < 0.1% |
| 9.15 | 9 | < 0.1% |
| Other values (13928) | 19201 |
| Value | Count | Frequency (%) |
| 1 | 3 | |
| 1.01 | 6 | |
| 1.02 | 5 | |
| 1.03 | 4 | |
| 1.05 | 6 | |
| 1.06 | 4 | |
| 1.07 | 3 | |
| 1.08 | 4 | |
| 1.09 | 1 | < 0.1% |
| 1.1 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 7508.46 | 1 | |
| 4673.39 | 1 | |
| 3304.44 | 1 | |
| 3066.61 | 1 | |
| 2967.92 | 1 | |
| 2717.69 | 1 | |
| 2162.2 | 1 | |
| 2105.36 | 1 | |
| 2025.15 | 1 | |
| 1866.15 | 1 |
| Distinct | 985 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48786.68179 |
| Minimum | 1257 |
|---|---|
| Maximum | 99921 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 1257 |
|---|---|
| 5-th percentile | 7208 |
| Q1 | 26041 |
| median | 48043 |
| Q3 | 72011 |
| 95-th percentile | 95453 |
| Maximum | 99921 |
| Range | 98664 |
| Interquartile range (IQR) | 45970 |
Descriptive statistics
| Standard deviation | 27049.76004 |
|---|---|
| Coefficient of variation (CV) | 0.5544496785 |
| Kurtosis | -1.083789183 |
| Mean | 48786.68179 |
| Median Absolute Deviation (MAD) | 22994 |
| Skewness | 0.100886626 |
| Sum | 941680532 |
| Variance | 731689518 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 16034 | 52 | 0.3% |
| 73754 | 51 | 0.3% |
| 48088 | 51 | 0.3% |
| 61454 | 49 | 0.3% |
| 62262 | 44 | 0.2% |
| 99160 | 43 | 0.2% |
| 59448 | 43 | 0.2% |
| 82514 | 42 | 0.2% |
| 16858 | 42 | 0.2% |
| 91206 | 42 | 0.2% |
| Other values (975) | 18843 |
| Value | Count | Frequency (%) |
| 1257 | 19 | |
| 1330 | 19 | |
| 1535 | 13 | 0.1% |
| 1545 | 15 | |
| 1612 | 14 | |
| 1843 | 34 | |
| 1844 | 30 | |
| 2180 | 14 | |
| 2630 | 29 | |
| 2908 | 17 |
| Value | Count | Frequency (%) |
| 99921 | 14 | 0.1% |
| 99783 | 22 | |
| 99747 | 12 | 0.1% |
| 99746 | 14 | 0.1% |
| 99323 | 23 | |
| 99160 | 43 | |
| 99116 | 15 | 0.1% |
| 99113 | 14 | 0.1% |
| 99033 | 37 | |
| 98836 | 16 | 0.1% |
| Distinct | 983 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.61214043 |
| Minimum | 20.0271 |
|---|---|
| Maximum | 66.6933 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 20.0271 |
|---|---|
| 5-th percentile | 29.9912 |
| Q1 | 34.7789 |
| median | 39.39 |
| Q3 | 42.0158 |
| 95-th percentile | 45.8433 |
| Maximum | 66.6933 |
| Range | 46.6662 |
| Interquartile range (IQR) | 7.2369 |
Descriptive statistics
| Standard deviation | 5.126584113 |
|---|---|
| Coefficient of variation (CV) | 0.1327713008 |
| Kurtosis | 1.468351282 |
| Mean | 38.61214043 |
| Median Absolute Deviation (MAD) | 3.3343 |
| Skewness | -0.04440826191 |
| Sum | 745291.5345 |
| Variance | 26.28186466 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 40.8555 | 52 | 0.3% |
| 42.5164 | 51 | 0.3% |
| 36.385 | 51 | 0.3% |
| 40.6761 | 49 | 0.3% |
| 38.9311 | 44 | 0.2% |
| 48.8878 | 43 | 0.2% |
| 48.2777 | 43 | 0.2% |
| 34.1556 | 42 | 0.2% |
| 26.4215 | 42 | 0.2% |
| 41.0001 | 42 | 0.2% |
| Other values (973) | 18843 |
| Value | Count | Frequency (%) |
| 20.0271 | 19 | |
| 20.0827 | 22 | |
| 24.6557 | 29 | |
| 26.1184 | 41 | |
| 26.3304 | 15 | 0.1% |
| 26.3771 | 10 | 0.1% |
| 26.4215 | 42 | |
| 26.4722 | 37 | |
| 26.529 | 20 | |
| 26.6939 | 16 | 0.1% |
| Value | Count | Frequency (%) |
| 66.6933 | 12 | 0.1% |
| 65.6899 | 14 | 0.1% |
| 64.7556 | 22 | |
| 55.4732 | 14 | 0.1% |
| 48.8878 | 43 | |
| 48.8856 | 22 | |
| 48.8328 | 29 | |
| 48.6669 | 21 | |
| 48.6031 | 22 | |
| 48.4786 | 26 |
| Distinct | 983 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -90.3445627 |
| Minimum | -165.6723 |
|---|---|
| Maximum | -67.9503 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 19302 |
| Negative (%) | 100.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | -165.6723 |
|---|---|
| 5-th percentile | -120.1922 |
| Q1 | -96.8094 |
| median | -87.5917 |
| Q3 | -80.1629 |
| 95-th percentile | -73.3113 |
| Maximum | -67.9503 |
| Range | 97.722 |
| Interquartile range (IQR) | 16.6465 |
Descriptive statistics
| Standard deviation | 14.09176155 |
|---|---|
| Coefficient of variation (CV) | -0.1559779707 |
| Kurtosis | 1.958082009 |
| Mean | -90.3445627 |
| Median Absolute Deviation (MAD) | 8.2675 |
| Skewness | -1.191739657 |
| Sum | -1743830.749 |
| Variance | 198.5777437 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -79.7372 | 52 | 0.3% |
| -98.0727 | 51 | 0.3% |
| -82.9832 | 51 | 0.3% |
| -91.0391 | 49 | 0.3% |
| -89.2463 | 44 | 0.2% |
| -118.2105 | 43 | 0.2% |
| -112.8456 | 43 | 0.2% |
| -99.0025 | 42 | 0.2% |
| -73.098 | 42 | 0.2% |
| -79.7856 | 42 | 0.2% |
| Other values (973) | 18843 |
| Value | Count | Frequency (%) |
| -165.6723 | 22 | |
| -156.292 | 14 | |
| -155.488 | 22 | |
| -155.3697 | 19 | |
| -153.994 | 12 | |
| -133.1171 | 14 | |
| -124.4409 | 22 | |
| -124.2174 | 26 | |
| -124.1587 | 17 | |
| -124.1437 | 27 |
| Value | Count | Frequency (%) |
| -67.9503 | 31 | |
| -68.5565 | 19 | |
| -69.2675 | 14 | 0.1% |
| -69.4828 | 26 | |
| -69.9576 | 10 | 0.1% |
| -69.9656 | 38 | |
| -70.1031 | 9 | < 0.1% |
| -70.239 | 10 | 0.1% |
| -70.3001 | 29 | |
| -70.3457 | 32 |
city_pop
Real number (ℝ≥0)
| Distinct | 891 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 88944.23977 |
| Minimum | 23 |
|---|---|
| Maximum | 2906700 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 23 |
|---|---|
| 5-th percentile | 140 |
| Q1 | 760 |
| median | 2526 |
| Q3 | 19685 |
| 95-th percentile | 525713 |
| Maximum | 2906700 |
| Range | 2906677 |
| Interquartile range (IQR) | 18925 |
Descriptive statistics
| Standard deviation | 301863.0315 |
|---|---|
| Coefficient of variation (CV) | 3.393845766 |
| Kurtosis | 38.23944993 |
| Mean | 88944.23977 |
| Median Absolute Deviation (MAD) | 2263 |
| Skewness | 5.628461696 |
| Sum | 1716801716 |
| Variance | 9.112128981 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 606 | 75 | 0.4% |
| 1263321 | 67 | 0.3% |
| 302 | 65 | 0.3% |
| 2906700 | 65 | 0.3% |
| 1595797 | 61 | 0.3% |
| 601723 | 59 | 0.3% |
| 1126 | 58 | 0.3% |
| 1312922 | 58 | 0.3% |
| 1577385 | 57 | 0.3% |
| 471 | 57 | 0.3% |
| Other values (881) | 18680 |
| Value | Count | Frequency (%) |
| 23 | 19 | |
| 37 | 13 | 0.1% |
| 43 | 9 | < 0.1% |
| 46 | 37 | |
| 47 | 9 | < 0.1% |
| 49 | 24 | |
| 51 | 23 | |
| 52 | 17 | |
| 53 | 38 | |
| 60 | 22 |
| Value | Count | Frequency (%) |
| 2906700 | 65 | |
| 2504700 | 25 | 0.1% |
| 2383912 | 13 | 0.1% |
| 1595797 | 61 | |
| 1577385 | 57 | |
| 1526206 | 39 | |
| 1417793 | 8 | < 0.1% |
| 1382480 | 19 | 0.1% |
| 1312922 | 58 | |
| 1263321 | 67 |
| Distinct | 19295 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.60800665 |
| Minimum | 19.161782 |
|---|---|
| Maximum | 67.510267 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 19.161782 |
|---|---|
| 5-th percentile | 29.8023046 |
| Q1 | 34.880532 |
| median | 39.4115625 |
| Q3 | 41.9913195 |
| 95-th percentile | 46.0132691 |
| Maximum | 67.510267 |
| Range | 48.348485 |
| Interquartile range (IQR) | 7.1107875 |
Descriptive statistics
| Standard deviation | 5.164621843 |
|---|---|
| Coefficient of variation (CV) | 0.1337707458 |
| Kurtosis | 1.447753867 |
| Mean | 38.60800665 |
| Median Absolute Deviation (MAD) | 3.347456 |
| Skewness | -0.04791012658 |
| Sum | 745211.7444 |
| Variance | 26.67331879 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 38.401561 | 2 | < 0.1% |
| 40.876763 | 2 | < 0.1% |
| 43.067486 | 2 | < 0.1% |
| 41.222094 | 2 | < 0.1% |
| 40.931109 | 2 | < 0.1% |
| 33.029036 | 2 | < 0.1% |
| 40.411281 | 2 | < 0.1% |
| 41.319753 | 1 | < 0.1% |
| 38.945437 | 1 | < 0.1% |
| 48.431451 | 1 | < 0.1% |
| Other values (19285) | 19285 |
| Value | Count | Frequency (%) |
| 19.161782 | 1 | |
| 19.215318 | 1 | |
| 19.238269 | 1 | |
| 19.313894 | 1 | |
| 19.393922 | 1 | |
| 19.399206 | 1 | |
| 19.425114 | 1 | |
| 19.531144 | 1 | |
| 19.607092 | 1 | |
| 19.608886 | 1 |
| Value | Count | Frequency (%) |
| 67.510267 | 1 | |
| 67.441518 | 1 | |
| 67.397018 | 1 | |
| 67.188111 | 1 | |
| 67.064277 | 1 | |
| 66.835174 | 1 | |
| 66.605445 | 1 | |
| 66.591565 | 1 | |
| 66.410651 | 1 | |
| 66.357215 | 1 |
| Distinct | 19295 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -90.34688463 |
| Minimum | -166.558056 |
|---|---|
| Maximum | -66.960745 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 19302 |
| Negative (%) | 100.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | -166.558056 |
|---|---|
| 5-th percentile | -120.125384 |
| Q1 | -96.9514715 |
| median | -87.493075 |
| Q3 | -80.237604 |
| 95-th percentile | -73.2030188 |
| Maximum | -66.960745 |
| Range | 99.597311 |
| Interquartile range (IQR) | 16.7138675 |
Descriptive statistics
| Standard deviation | 14.10718771 |
|---|---|
| Coefficient of variation (CV) | -0.1561447056 |
| Kurtosis | 1.95989688 |
| Mean | -90.34688463 |
| Median Absolute Deviation (MAD) | 8.287123 |
| Skewness | -1.189822007 |
| Sum | -1743875.567 |
| Variance | 199.012745 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -77.807972 | 2 | < 0.1% |
| -82.333658 | 2 | < 0.1% |
| -122.658093 | 2 | < 0.1% |
| -81.659016 | 2 | < 0.1% |
| -82.976778 | 2 | < 0.1% |
| -96.737923 | 2 | < 0.1% |
| -76.926858 | 2 | < 0.1% |
| -96.354157 | 1 | < 0.1% |
| -87.420652 | 1 | < 0.1% |
| -93.884543 | 1 | < 0.1% |
| Other values (19285) | 19285 |
| Value | Count | Frequency (%) |
| -166.558056 | 1 | |
| -166.550779 | 1 | |
| -166.478734 | 1 | |
| -166.403973 | 1 | |
| -166.163025 | 1 | |
| -166.107063 | 1 | |
| -166.080207 | 1 | |
| -166.067029 | 1 | |
| -165.986117 | 1 | |
| -165.914542 | 1 |
| Value | Count | Frequency (%) |
| -66.960745 | 1 | |
| -67.154141 | 1 | |
| -67.38903 | 1 | |
| -67.392489 | 1 | |
| -67.394711 | 1 | |
| -67.494118 | 1 | |
| -67.503251 | 1 | |
| -67.533581 | 1 | |
| -67.569238 | 1 | |
| -67.618547 | 1 |
age
Real number (ℝ≥0)
| Distinct | 83 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47.56196249 |
| Minimum | 14 |
|---|---|
| Maximum | 96 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 33 |
| median | 46 |
| Q3 | 59 |
| 95-th percentile | 82 |
| Maximum | 96 |
| Range | 82 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 18.03992112 |
|---|---|
| Coefficient of variation (CV) | 0.3792930354 |
| Kurtosis | -0.4138004026 |
| Mean | 47.56196249 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 0.5003906678 |
| Sum | 918041 |
| Variance | 325.4387541 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 33 | 513 | 2.7% |
| 47 | 503 | 2.6% |
| 34 | 496 | 2.6% |
| 35 | 481 | 2.5% |
| 48 | 480 | 2.5% |
| 32 | 476 | 2.5% |
| 46 | 454 | 2.4% |
| 43 | 442 | 2.3% |
| 30 | 437 | 2.3% |
| 49 | 412 | 2.1% |
| Other values (73) | 14608 |
| Value | Count | Frequency (%) |
| 14 | 10 | 0.1% |
| 15 | 37 | 0.2% |
| 16 | 97 | 0.5% |
| 17 | 15 | 0.1% |
| 18 | 86 | 0.4% |
| 19 | 94 | 0.5% |
| 20 | 220 | |
| 21 | 227 | |
| 22 | 365 | |
| 23 | 314 |
| Value | Count | Frequency (%) |
| 96 | 7 | < 0.1% |
| 95 | 1 | < 0.1% |
| 94 | 60 | |
| 93 | 66 | |
| 92 | 104 | |
| 91 | 73 | |
| 90 | 80 | |
| 89 | 79 | |
| 88 | 50 | |
| 87 | 72 |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.39954409 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 1143 |
| Zeros (%) | 5.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4 |
| median | 15 |
| Q3 | 22 |
| 95-th percentile | 23 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 18 |
Descriptive statistics
| Standard deviation | 8.405531312 |
|---|---|
| Coefficient of variation (CV) | 0.6272997989 |
| Kurtosis | -1.459755638 |
| Mean | 13.39954409 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.319960327 |
| Sum | 258638 |
| Variance | 70.65295663 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 23 | 2953 | |
| 22 | 2939 | |
| 1 | 1156 | 6.0% |
| 0 | 1143 | 5.9% |
| 3 | 1134 | 5.9% |
| 2 | 1106 | 5.7% |
| 16 | 607 | 3.1% |
| 19 | 599 | 3.1% |
| 13 | 595 | 3.1% |
| 18 | 595 | 3.1% |
| Other values (14) | 6475 |
| Value | Count | Frequency (%) |
| 0 | 1143 | |
| 1 | 1156 | |
| 2 | 1106 | |
| 3 | 1134 | |
| 4 | 386 | 2.0% |
| 5 | 375 | 1.9% |
| 6 | 367 | 1.9% |
| 7 | 378 | 2.0% |
| 8 | 356 | 1.8% |
| 9 | 409 | 2.1% |
| Value | Count | Frequency (%) |
| 23 | 2953 | |
| 22 | 2939 | |
| 21 | 574 | 3.0% |
| 20 | 571 | 3.0% |
| 19 | 599 | 3.1% |
| 18 | 595 | 3.1% |
| 17 | 586 | 3.0% |
| 16 | 607 | 3.1% |
| 15 | 594 | 3.1% |
| 14 | 579 | 3.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.023779919 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 3446 |
| Zeros (%) | 17.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.134026467 |
|---|---|
| Coefficient of variation (CV) | 0.7057479459 |
| Kurtosis | -1.390979879 |
| Mean | 3.023779919 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.03792912189 |
| Sum | 58365 |
| Variance | 4.554068961 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 3446 | |
| 6 | 3351 | |
| 5 | 2848 | |
| 1 | 2685 | |
| 4 | 2511 | |
| 3 | 2368 | |
| 2 | 2093 |
| Value | Count | Frequency (%) |
| 0 | 3446 | |
| 1 | 2685 | |
| 2 | 2093 | |
| 3 | 2368 | |
| 4 | 2511 | |
| 5 | 2848 | |
| 6 | 3351 |
| Value | Count | Frequency (%) |
| 6 | 3351 | |
| 5 | 2848 | |
| 4 | 2511 | |
| 3 | 2368 | |
| 2 | 2093 | |
| 1 | 2685 | |
| 0 | 3446 |
month
Real number (ℝ≥0)
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.754999482 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 150.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.477986257 |
|---|---|
| Coefficient of variation (CV) | 0.5148758732 |
| Kurtosis | -1.196016523 |
| Mean | 6.754999482 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.0352673039 |
| Sum | 130385 |
| Variance | 12.0963884 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 12 | 2313 | |
| 6 | 1741 | |
| 8 | 1694 | |
| 5 | 1674 | |
| 3 | 1664 | |
| 7 | 1563 | |
| 10 | 1547 | |
| 9 | 1513 | |
| 4 | 1416 | |
| 1 | 1413 | |
| Other values (2) | 2764 |
| Value | Count | Frequency (%) |
| 1 | 1413 | |
| 2 | 1360 | |
| 3 | 1664 | |
| 4 | 1416 | |
| 5 | 1674 | |
| 6 | 1741 | |
| 7 | 1563 | |
| 8 | 1694 | |
| 9 | 1513 | |
| 10 | 1547 |
| Value | Count | Frequency (%) |
| 12 | 2313 | |
| 11 | 1404 | |
| 10 | 1547 | |
| 9 | 1513 | |
| 8 | 1694 | |
| 7 | 1563 | |
| 6 | 1741 | |
| 5 | 1674 | |
| 4 | 1416 | |
| 3 | 1664 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 150.9 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 9651 | |
| 0 | 9651 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 9651 | |
| 1 | 9651 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| category | amt | zip | lat | long | city_pop | merch_lat | merch_long | age | hour | day | month | is_fraud | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | grocery_pos | 281.06 | 28611 | 35.9946 | -81.7266 | 885 | 36.430124 | -81.179483 | 31 | 1 | 2 | 1 | 1 |
| 1 | gas_transport | 11.52 | 78208 | 29.4400 | -98.4590 | 1595797 | 29.819364 | -99.142791 | 59 | 1 | 2 | 1 | 1 |
| 2 | grocery_pos | 276.31 | 78208 | 29.4400 | -98.4590 | 1595797 | 29.273085 | -98.836360 | 59 | 3 | 2 | 1 | 1 |
| 3 | gas_transport | 7.03 | 28611 | 35.9946 | -81.7266 | 885 | 35.909292 | -82.091010 | 31 | 3 | 2 | 1 | 1 |
| 4 | grocery_pos | 275.73 | 78208 | 29.4400 | -98.4590 | 1595797 | 29.786426 | -98.683410 | 59 | 3 | 2 | 1 | 1 |
| 5 | shopping_net | 844.80 | 28611 | 35.9946 | -81.7266 | 885 | 35.987802 | -81.254332 | 31 | 13 | 2 | 1 | 1 |
| 6 | misc_net | 843.91 | 28611 | 35.9946 | -81.7266 | 885 | 35.985612 | -81.383306 | 31 | 23 | 2 | 1 | 1 |
| 7 | gas_transport | 10.76 | 78208 | 29.4400 | -98.4590 | 1595797 | 28.856712 | -97.794207 | 59 | 1 | 3 | 1 | 1 |
| 8 | grocery_pos | 332.35 | 78208 | 29.4400 | -98.4590 | 1595797 | 29.320662 | -97.937219 | 59 | 1 | 3 | 1 | 1 |
| 9 | grocery_pos | 315.34 | 78208 | 29.4400 | -98.4590 | 1595797 | 28.953283 | -97.806528 | 59 | 3 | 3 | 1 | 1 |
Last rows
| category | amt | zip | lat | long | city_pop | merch_lat | merch_long | age | hour | day | month | is_fraud | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19292 | personal_care | 86.53 | 73754 | 36.3850 | -98.0727 | 1078 | 36.665223 | -98.247317 | 68 | 12 | 4 | 5 | 0 |
| 19293 | health_fitness | 66.64 | 38922 | 33.9215 | -89.6782 | 3451 | 33.417267 | -89.403580 | 36 | 23 | 0 | 6 | 0 |
| 19294 | gas_transport | 80.62 | 48088 | 42.5164 | -82.9832 | 134056 | 41.852968 | -83.273248 | 63 | 11 | 5 | 3 | 0 |
| 19295 | gas_transport | 48.32 | 17058 | 40.5553 | -77.4001 | 1909 | 40.264776 | -77.317657 | 66 | 0 | 2 | 12 | 0 |
| 19296 | misc_net | 74.77 | 46346 | 41.4802 | -86.6919 | 1423 | 41.023610 | -87.028942 | 22 | 12 | 6 | 12 | 0 |
| 19297 | gas_transport | 62.58 | 29455 | 32.8357 | -79.8217 | 20478 | 32.418947 | -78.873660 | 22 | 10 | 3 | 2 | 0 |
| 19298 | misc_net | 273.25 | 14711 | 42.3200 | -78.0943 | 1766 | 41.831789 | -77.151160 | 58 | 8 | 3 | 10 | 0 |
| 19299 | misc_pos | 2.74 | 53129 | 42.9373 | -87.9943 | 13973 | 42.604173 | -88.573357 | 38 | 7 | 4 | 12 | 0 |
| 19300 | shopping_net | 161.28 | 40077 | 38.4921 | -85.4524 | 564 | 38.225422 | -85.135045 | 23 | 16 | 1 | 12 | 0 |
| 19301 | kids_pets | 163.68 | 44233 | 41.2419 | -81.7453 | 7646 | 41.654965 | -80.811580 | 31 | 22 | 4 | 3 | 0 |